Prediction of OCR Accuracy

نویسندگان

Luis R. Blando

Junichi Kanai

Thomas A. Nartker

Juan Gonzalez

چکیده

The accuracy of all contemporary OCR technologies varies drastically as a function of input image quality [Rice 92, Rice 93, Chen 93, Rice 94]. Given high quality images, many devices consistently deliver output text in excess of 99% correct. For low quality images, even images which are easily read by a human, output accuracy is frequently below 90%. This extreme sensitivity to quality is well known in the document analysis field and is the subject of much current research. In this ongoing project, we have been interested in developing measures of image quality. We are especially interested in learning to predict OCR accuracy using some combination of image quality measures independent of OCR devices themselves. Reliable algorithms for measuring print quality and predicting OCR accuracy would be valuable in several ways. First, in large scale document conversion operations, they could be used to automatically filter out pages that are more economically recovered via manual entry. Second, they might be employed iteratively as part of an adaptive image enhancement system. At the same time, studies into the nature of image quality can contribute to our overall understanding of the effect of noise on algorithms for classification. In this paper, we propose a prediction technique based upon measuring features associated with degraded characters. In order to limit the scope of the research, the following assumptions are made:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of OCR accuracy using simple image features

A classifier for predicting the character accuracy achieved by any Optical Character Recognition (OCR) system on a given page is presented. This classifier is based on measuring the amount of white speckle, the amount of character fragments, and overall size information in the page. No output from the OCR system is used. The given page is classified as either “good” quality (i.e., high OCR accu...

متن کامل

Prediction of OCR accuracy using a Neural Network

A method for predicting the accuracy achieved by an OCR system on an input image is presented. It is assumed that there is an ideal prediction function. A neural network is trained to estimate the unknown ideal function. In this project, multilayer perceptrons were trained to predict the character accuracy performance of two OCR systems using the backpropagation training method. The results sho...

متن کامل

Improving OCR Accuracy on Early Printed Books using Deep Convolutional Networks

This paper proposes a combination of a convolutional and a LSTM network to improve the accuracy of OCR on early printed books. While the standard model of line based OCR uses a single LSTM layer, we utilize a CNNand Pooling-Layer combination in advance of an LSTM layer. Due to the higher amount of trainable parameters the performance of the network relies on a high amount of training examples t...

متن کامل

Confidence Prediction for Lexicon-Free OCR

Having a reliable accuracy score is crucial for real world applications of OCR, since such systems are judged by the number of false readings. Lexicon-based OCR systems, which deal with what is essentially a multi-class classification problem, often employ methods explicitly taking into account the lexicon, in order to improve accuracy. However, in lexicon-free scenarios, filtering errors requi...

متن کامل

Structured Prediction with Test-time Budget Constraints

We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining predictio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1992

Prediction of OCR Accuracy

نویسندگان

چکیده

منابع مشابه

Prediction of OCR accuracy using simple image features

Prediction of OCR accuracy using a Neural Network

Improving OCR Accuracy on Early Printed Books using Deep Convolutional Networks

Confidence Prediction for Lexicon-Free OCR

Structured Prediction with Test-time Budget Constraints

عنوان ژورنال:

اشتراک گذاری